A spatiotemporal analysis of the major types of crime in London¶
Preparation¶
Number of words: around 1600 words
Runtime: within 10 mins (Memory 32 GB, CPU Intel Ultra 9 185H 2.50 GHz)
Coding environment: SDS Docker
License: this notebook is made available under the Creative Commons Attribution license.
There is no additional library [libraries not included in SDS Docker or not used in this module] being used in this notebook.
import time
start_time = time.time()
import pandas as pd
import geopandas as gpd
import os
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from esda.moran import Moran
from esda.getisord import G_Local
from libpysal.weights import Queen
from tabulate import tabulate
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram
from scipy.stats import chi2_contingency
Table of contents¶
Introduction¶
In large metropolitan areas like London, crime patterns are not randomly distributed but tend to concentrate in specific locations and fluctuate over time. Identifying these spatiotemporal patterns is crucial for developing effective crime prevention and intervention strategies. As scholars have emphasized, “understanding where and when crimes occur is essential for targeted policing and urban policy design” (Chainey et al., 2008).
Recent advancements in Geographic Information Systems (GIS) and data science have opened up new possibilities for analyzing crime dynamics. Techniques that account for both space and time allow researchers to go beyond traditional static maps and examine how crime hotspots evolve over time (Nakaya & Yano, 2010). Clustering methods, such as hierarchical clustering, have proven to be particularly effective in uncovering hidden patterns and relationships between different crime types (Joshi et al., 2017). These methods provide valuable insights into key questions, such as whether certain crime types tend to co-occur and whether such co-occurrence follows a spatial structure.
This research aims to apply these advanced methods to uncover further insights into crime patterns and explore underlying trends that are not apparent. The findings will provide insights that can inform crime prevention policy, resource allocation, and broader urban planning initiatives.
Research questions¶
This study seeks to answer the following questions:
How are crime rates distributed over time and space?
Are there underlying relationships between different crime types, and are these relationships associated with crime density?
Data Description¶
This study primarily relies on three straightforward datasets:
- Most recent 24 months of crime data at the LSOA level
- Historical crime data at the LSOA level
- Geographic data for each LSOA
The only difference between the two crime datasets is the time period they cover. Both datasets share the same structure and include the following columns:
| Column Name | Description |
|---|---|
LSOA Code |
A unique identifier for each Lower Layer Super Output Area |
LSOA Name |
The name corresponding to each LSOA |
Borough |
A unique identifier for each Borough in which the LSOA is located |
Major Category |
The broad category of crime |
Minor Category |
The specific subcategory of the crime |
| Monthly crime counts | Each column represents a month, containing the crime counts for that month |
What are Borough and LSOA?¶
Borough: In London, a borough refers to a local government district. There are 32 London boroughs (e.g., Camden, Hackney, Westminster), each responsible for delivering various public services.
LSOA (Lower Layer Super Output Area): These are small geographic areas designed for statistical reporting in the UK. Each borough contains many LSOAs, and each LSOA typically contains around 1,500 residents. They provide a fine-grained spatial resolution for local analysis.
Crime Categories¶
The Major Category field includes 10 broad types of crime:
- ARSON AND CRIMINAL DAMAGE
- BURGLARY
- DRUG OFFENCES
- MISCELLANEOUS CRIMES AGAINST SOCIETY
- POSSESSION OF WEAPONS
- PUBLIC ORDER OFFENCES
- ROBBERY
- THEFT
- VEHICLE OFFENCES
- VIOLENCE AGAINST THE PERSON
Each Major Category contains several Minor Categories that describe specific types of crime in more detail. However, due to the large number of subcategories, this study focuses solely on the Major Category level for clarity and interpretability.
Geographic Data¶
The LSOA geographic dataset provides spatial boundaries for all LSOAs in London. This data allows for the calculation of the area of each LSOA and supports spatial analysis and mapping.
# Import crime data
recent_df = pd.read_csv('https://raw.githubusercontent.com/YYY677/DSSS/refs/heads/main/Assessment/data/MPS%20LSOA%20Level%20Crime%20(most%20recent%2024%20months).csv')
historical_df = pd.read_csv('https://raw.githubusercontent.com/YYY677/DSSS/refs/heads/main/Assessment/data/MPS%20LSOA%20Level%20Crime%20(Historical).csv')
# Lets merge these two datasets
df = pd.merge(historical_df, recent_df, on=['LSOA Code', 'LSOA Name', 'Borough', 'Major Category', 'Minor Category'], how='outer').fillna(0).sort_index(axis = 1)
df.head()
| 201004 | 201005 | 201006 | 201007 | 201008 | 201009 | 201010 | 201011 | 201012 | 201101 | ... | 202410 | 202411 | 202412 | 202501 | Borough | LSOA Code | LSOA Name | Major Category | Minor Category | Refreshed Date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | E09000002 | E01000006 | Barking and Dagenham 016A | ARSON AND CRIMINAL DAMAGE | ARSON | 0 |
| 1 | 1.0 | 3.0 | 0.0 | 2.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 0.0 | ... | 0.0 | 2.0 | 0.0 | 1.0 | E09000002 | E01000006 | Barking and Dagenham 016A | ARSON AND CRIMINAL DAMAGE | CRIMINAL DAMAGE | 05/02/2025 |
| 2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | E09000002 | E01000006 | Barking and Dagenham 016A | BURGLARY | BURGLARY - RESIDENTIAL | 05/02/2025 |
| 3 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | E09000002 | E01000006 | Barking and Dagenham 016A | BURGLARY | BURGLARY BUSINESS AND COMMUNITY | 0 |
| 4 | 3.0 | 0.0 | 0.0 | 1.0 | 1.0 | 3.0 | 1.0 | 1.0 | 3.0 | 2.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | E09000002 | E01000006 | Barking and Dagenham 016A | BURGLARY | BURGLARY IN A DWELLING | 05/02/2025 |
5 rows × 184 columns
# calculating the total number of crimes for each year (2011-2024)
# and aggregates the data by 'LSOA Code', 'Borough', and 'Major Category'.
df_yearly = df.copy()
# Only using data 2014 to 2024
years = range(2011, 2025)
# Iterate through each year and calculate the total
for year in years:
# Dynamically obtains column names for each year
cols_to_sum = [col for col in df_yearly.columns if str(year) in str(col)]
# Calculate the total number of crimes for each year
df_yearly[str(year)] = df_yearly[cols_to_sum].sum(axis=1)
# Create a new dataframe that contains data for each year
df_yearly = df_yearly[["LSOA Code", "Borough", "Major Category"] + [str(year) for year in years]]
df_yearly = df_yearly.groupby(['LSOA Code', "Borough", 'Major Category']).sum().reset_index()
# show the result
df_yearly.head()
| LSOA Code | Borough | Major Category | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | E01000006 | E09000002 | ARSON AND CRIMINAL DAMAGE | 4.0 | 9.0 | 5.0 | 8.0 | 8.0 | 8.0 | 5.0 | 12.0 | 7.0 | 6.0 | 5.0 | 6.0 | 5.0 | 5.0 |
| 1 | E01000006 | E09000002 | BURGLARY | 20.0 | 17.0 | 14.0 | 10.0 | 10.0 | 7.0 | 13.0 | 16.0 | 13.0 | 5.0 | 5.0 | 3.0 | 3.0 | 2.0 |
| 2 | E01000006 | E09000002 | DRUG OFFENCES | 10.0 | 1.0 | 6.0 | 8.0 | 6.0 | 1.0 | 3.0 | 14.0 | 6.0 | 25.0 | 16.0 | 17.0 | 5.0 | 3.0 |
| 3 | E01000006 | E09000002 | MISCELLANEOUS CRIMES AGAINST SOCIETY | 0.0 | 1.0 | 1.0 | 4.0 | 0.0 | 1.0 | 2.0 | 2.0 | 4.0 | 1.0 | 3.0 | 5.0 | 1.0 | 2.0 |
| 4 | E01000006 | E09000002 | POSSESSION OF WEAPONS | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.0 | 2.0 |
#load in the LSOA map
UK_LSOA = gpd.read_file("https://github.com/YYY677/DSSS/raw/refs/heads/main/Assessment/data/LSOA.gpkg")
#calculate the area of each lsoa
UK_LSOA['area_km2'] = UK_LSOA.area*10**-6
UK_LSOA.head()
| lsoa21cd | lsoa21nm | geometry | area_km2 | |
|---|---|---|---|---|
| 0 | E01000001 | City of London 001A | POLYGON ((532151.538 181867.433, 532152.5 1818... | 0.129865 |
| 1 | E01000002 | City of London 001B | POLYGON ((532634.497 181926.016, 532632.048 18... | 0.228420 |
| 2 | E01000003 | City of London 001C | POLYGON ((532153.703 182165.155, 532158.25 182... | 0.059054 |
| 3 | E01000005 | City of London 001E | POLYGON ((533619.062 181402.364, 533639.868 18... | 0.189578 |
| 4 | E01000006 | Barking and Dagenham 016A | POLYGON ((545126.852 184310.838, 545145.213 18... | 0.146537 |
Methodology¶
This study employed a comprehensive spatiotemporal analysis to examine crime patterns across London. The methodology consists of the following key steps:
Data Aggregation and Preprocessing
Crime data were aggregated by type, year, and location (LSOA). The data were then normalized by LSOA area to compute crime density.Temporal Trend Analysis
Line charts and heatmaps were used to explore annual and monthly fluctuations in total crimes and each major crime category over the past decade.Spatial Visualization
The spatial distribution of crime rates across LSOAs was visualized using quantile-based choropleth maps for each year, also allowing for identification of persistent hotspots.Cluster Analysis
Agglomerative hierarchical clustering was applied based on crime-type compositions. A dendrogram was used to determine the optimal number of clusters.Statistical Testing
Chi-square tests were conducted to assess associations between cluster types and crime density levels, revealing non-random spatial patterns.
Flow chart of analysis for Question 1:
Flow chart of analysis for Question 2:
Results and discussion¶
Temporal distribution of crime rate¶
Trend chart of frequency
# Timeseries of each Major Category. What are trends of each major category like?
drop_list = ['LSOA Code', 'Borough']
cate_grouped_yearly = df_yearly.groupby(by=['Major Category']).agg('sum').drop(columns=drop_list)
cate_grouped_yearly.head()
| 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Major Category | ||||||||||||||
| ARSON AND CRIMINAL DAMAGE | 75996.0 | 64730.0 | 54764.0 | 57314.0 | 61020.0 | 62106.0 | 61604.0 | 56324.0 | 55500.0 | 49982.0 | 51781.0 | 51942.0 | 56048.0 | 55138.0 |
| BURGLARY | 100836.0 | 97910.0 | 87187.0 | 75886.0 | 70489.0 | 68306.0 | 75975.0 | 80291.0 | 80362.0 | 60216.0 | 52318.0 | 51660.0 | 55434.0 | 52492.0 |
| DRUG OFFENCES | 64931.0 | 55755.0 | 50292.0 | 44400.0 | 39823.0 | 39592.0 | 36528.0 | 36084.0 | 47455.0 | 54571.0 | 45372.0 | 43358.0 | 38284.0 | 38063.0 |
| MISCELLANEOUS CRIMES AGAINST SOCIETY | 8768.0 | 8130.0 | 7727.0 | 8823.0 | 9979.0 | 11346.0 | 11190.0 | 11005.0 | 11143.0 | 11070.0 | 11130.0 | 11709.0 | 11003.0 | 10081.0 |
| POSSESSION OF WEAPONS | 5550.0 | 4286.0 | 3992.0 | 4322.0 | 4788.0 | 5843.0 | 7726.0 | 7685.0 | 7257.0 | 6725.0 | 5975.0 | 6278.0 | 6266.0 | 4522.0 |
# Get years
years = cate_grouped_yearly.columns.astype(int)
# Calculating the total count of crimes in each year
total_crimes = cate_grouped_yearly.sum()
# Plot trends for total count of crimes
plt.figure(figsize=(12, 8))
plt.plot(years, total_crimes, marker='o', linestyle='-', color='black', linewidth=2, label='Total Crimes')
# Plot trends for individual crime types
for category in cate_grouped_yearly.index:
plt.plot(years, cate_grouped_yearly.loc[category], marker='o', linestyle='--', label=category, alpha=0.7)
plt.title("Total Crime Trend and Individual Categories", fontsize=14)
plt.xlabel("Year", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.legend(fontsize=9, loc='upper right', bbox_to_anchor=(1.2, 1))
plt.grid(True, linestyle="--", alpha=0.6)
plt.tight_layout()
plt.show()
Monthly Heatmaps of Major Crime Categories
# Select only numeric columns (i.e., monthly crime counts)
numeric_cols = df.select_dtypes(include=['number']).columns
# Get all unique major crime categories
major_categories = df['Major Category'].unique()
# Loop through each crime category and plot one figure per category
for cat in major_categories:
# Filter data for the current crime category
filtered_data = df[df['Major Category'] == cat]
# Re-confirm numeric columns within the filtered data
numeric_cols = filtered_data.select_dtypes(include=['number']).columns
# Sum crime counts across all LSOAs for each month
monthly_sum = filtered_data[numeric_cols].sum()
# Convert to DataFrame
grouped_df = monthly_sum.reset_index()
grouped_df.columns = ['Date', 'Count']
# Extract month and year from the date string (e.g., "202401")
grouped_df['Month'] = grouped_df['Date'].str[4:6].astype(int)
grouped_df['Year'] = grouped_df['Date'].str[:4]
# Pivot for heatmap: rows are months, columns are years
monthly_pivot = grouped_df.pivot(index='Month', columns='Year', values='Count')
# Plot heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(monthly_pivot, cmap="YlGnBu", linewidths=.5)
plt.title(f'Monthly Distribution of {cat} Crimes in London', fontsize=15)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Month', fontsize=12)
plt.tight_layout()
plt.show()
📊 Time Series Analysis Summary¶
Overall Trends:
The general level of crime in London has shown significant fluctuations in recent years, with a slightly increasing overall trend.Top Crime Types by Volume:
The most prevalent types of crimes in terms of volume are:- Theft
- Violence Against the Person
- Vehicle Offences
- Burglary
Crimes with Noticeable Decrease in Frequency:
- Arson and Criminal Damage
- Burglary
- Drug Offences
Crimes with Noticeable Increase in Frequency:
- Miscellaneous Crimes Against Society
- Possession of Weapons
- Public Order Offences
- Theft
- Violence Against the Person
Spatial distribution of crime rate¶
Plots of Crime Rates by Major Category and LSOA
# Function to plot crime rate distribution maps
def plot_crime_rate_by_lsoa(years, major_cat, n_quantiles=7):
"""
Function to plot crime rate per LSOA for each year.
Parameters:
years: List of years to plot.
major_cat: Crime category to plot.
n_quantiles: Number of quantiles to split crime rate distribution.
"""
filtered_data = df_yearly[df_yearly['Major Category'] == major_cat].copy()
# Merge crime data with geographic data
merged_cat = UK_LSOA.merge(filtered_data, left_on='lsoa21cd', right_on='LSOA Code', how='left')
# Calculate area-normalized crime rate (crimes per km²) for each LSOA
for year in years:
merged_cat[f'{year}_area'] = merged_cat[str(year)] / merged_cat['area_km2']
# Set layout: 3 columns per row
ncols = 3
num_years = len(years)
nrows = -(-num_years // ncols) # Ceiling division
fig, axes = plt.subplots(nrows, ncols, figsize=(ncols * 5, nrows * 4.5))
axes = axes.flatten()
# Plot maps for each year
for i, year in enumerate(years):
ax = axes[i]
year_str = f'{year}_area'
# Drop NaNs for quantile calculation
year_data = merged_cat[year_str].dropna()
# Compute quantile-based bins
_, bins = pd.qcut(year_data, q=n_quantiles, retbins=True, duplicates="drop")
unique_cutpoints = np.unique(bins.round(2))
unique_cutpoints = unique_cutpoints[1:-1] # Drop min/max edge bins
# Plot map
merged_cat.plot(
column=year_str, cmap='RdBu_r', linewidth=0, ax=ax,
legend=True, scheme='user_defined',
classification_kwds={'bins': unique_cutpoints},
legend_kwds={
'fontsize': 9,
'markerscale': 0.5,
'loc': 'lower right',
'frameon': True,
'title': 'Rate'
},
missing_kwds={
'color': 'lightgrey',
'hatch': '///',
'label': 'Missing'
}
)
ax.set_title(f'{major_cat} Rate\n{year} (per km²)', fontsize=12)
ax.axis('off')
# Remove unused subplots if any
for i in range(num_years, len(axes)):
fig.delaxes(axes[i])
# Adjust spacing between subplots
plt.subplots_adjust(hspace=0, wspace=0)
plt.tight_layout()
plt.show()
years = ['2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']
major_categories = df_yearly['Major Category'].unique()
for major_cat in major_categories:
plot_crime_rate_by_lsoa(years, major_cat)